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Abstract. While there is no lack of efficient Krylov subspace solvers for Hermitian systems, 
there are few for complex symmetric, skew symmetric, or skew Hermitian systems, which are increas- 
ingly important in modern applications including quantum dynamics, electromagnetics, and power 
systems. For a large consistent complex symmetric system, one may apply a non-Hermitian Krylov 
subspace method disregarding the symmetry of A, or a Hermitian Krylov solver on the equivalent 
normal equation or an augmented system twice the original dimension. These have the disadvan- 
tages of increasing either memory, conditioning, or computational costs. An exception is a special 
version of QMR by Freund (1992), but that may be affected by non-benign breakdowns unless look- 
ahead is implemented; furthermore, it is designed for only consistent and nonsingular problems. For 
skew symmetric systems, Greif and Varah (2009) adapted CG for nonsingular skew symmetric linear 
^4 systems that are necessarily and restrictively of even order. 

We extend the symmetric and Hermitian algorithms MINRES and MINRES-QLP by Choi, Paige 
|y\ and Saunders (2011) to complex symmetric, skew symmetric, and skew Hermitian systems. In 

*vj particular, MINRES-QLP uses a rank-revealing QLP decomposition of the tridiagonal matrix from a 

three-term recurrent complex-symmetric Lanczos process. Whether the systems are real or complex, 
t singular or invertible, compatible or inconsistent, MINRES-QLP computes the unique minimum- 

Tf\ length, i.e., pseudoinverse, solutions. It is a significant extension of MINRES by Paige and Saunders 

k— — I (1975) with enhanced stability and capability. 
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^*>0 1. Introduction. Krylov subspace methods are generally divided into two classes: 

Those for Hermitian matrices (e.g. CG [27], MINRES [38], SYMMLQ [38], MINRES- 
QLP [8, 9, 11, 7]) and those for general matrices without such symmetries (e.g. 
BiCG [15], GMRES [41], QMR [19], BiCGstab [52], LSQR [39, 40]). Such a division 
is largely due to historical reasons in numerical linear algebra — the most prevalent 
structure for matrices arising from practical applications is that of being Hermitian 
(which reduces to symmetric for real matrices). However other types of symmetry 
structures, notably complex symmetric, skew-symmetric, and skew-Hermitian ma- 
trices, are becoming increasingly common in modern applications. Currently, aside 
possibly for storage and matrix-vector products, these are treated like any general 
matrices with no symmetry structures. The algorithms in this article go substantially 
further in developing specialized Krylov subspace algorithms designed at the outset to 
exploit these symmetry structures. In addition, our algorithms constructively reveal 
the (numerical) compatibility and singularity of a given linear system — users do not 
have to know these properties a priori. 
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We are concerned with iterative methods for solving a large linear system Ax = b 
or the more general minimum-length least-squares (LS) problem 



min \\x 2 s.t. 



x £ arg mm 



\Ax-b\\ 2l 



(1.1) 



where A £ C nx ™ is complex symmetric (^4 = A T ) or skew Hermitian (^4 = —A*), and 
possibly singular; and b £ C™. Our results are directly applicable to problems with 
symmetric or skew symmetric matrices A = ±A T £ M. nxn and real vectors b. A may 
exist only as an operator for returning the product Ax. 

The solution of (1-1), called the minimum-length or pseudoinverse solution [21], 
is formally given by X' — {A*A)^A*b, where A' denotes the pseudoinverse of A. 
The pseudoinverse is continuous under perturbations E for which rank (A + E) = 
rank (A) [46], and x' is continuous under the same condition. Problem (1.1) is then 
well-posed [24]. 

Let A — UTjU t be a Takagi decomposition [29], a singular- value decomposition 
(SVD) specialized for a complex symmetric matrix, with U unitary (U*U — I) and 
£ = diag([ CT i.--.°'n ]) real non-negative and o\ > a 2 > ■ ■ • > o> > 0, where r is 
the rank of A. We define the condition number of A to be n(A) = — , and we say 
that A is ill-conditioned if n(A) ^$> 1. Hence a mathematically nonsingular matrix 
(e.g., A = [J°], where e is the machine precision) could be regarded as numerically 
singular. Also, a singular matrix could be well-conditioned or ill-conditioned. For a 
skew Hermitian matrix, we use its (full) eigenvalue decomposition A — VAV* , where 
A is a diagonal matrix of imaginary numbers (possibly zeros; in conjugate pairs if A 
is real, i.e., skew symmetric) and V is unitary 1 . We define its condition number as 



k(A) 



[Ai| 

|A r | 



the ratio of the largest and smallest nonzero eigenvalues in magnitude. 



Example 1.1. We contrast the five classes of symmetric or Hermitian matrices 
by their definitions and small instances of order n = 2: 



l nxn 3A = A T = 
:" x " 9 A = A* = 

: nxn 3A = A T = 

i nXn BA = -A T = 

: nxn 3A = -A* = 



1 5 
5 1 

1 1-2* 

1 + 2i 1 

2 + i 1-2* 
1-2* i 



5 

-5 



is symmetric. 

is Hermitian (with real diagonal), 
is complex symmetric (with complex diagonal). 
is skew symmetric (with zero diagonal). 

is skew Hermitian (with zero diagonal). 



1-2?: 

-1-2* 



CG, SYMMLQ, and MINRES are designed for solving nonsingular symmetric sys- 
tems Ax = b. CG is efficient on symmetric positive definite systems. For indefinite 
problems, SYMMLQ and MINRES are reliable even if A is ill-conditioned. 

Choi [7] appeared to be the first work that comparatively analyzed the algorithms 
on singular symmetric and Hermitian problems. On (singular) incompatible problems 



1 Skew Hermitian (symmetric) matrices are, like Hermitian matrices, unitarily diagonalizable (i.e., 
normal [51, Theorem 24.8]). 
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CG and SYMMLQ iterates Xk diverge to some nullvectors of A [7, Propositions 2.7, 
2.8, and 2.15; Lemma 2.17]. MINRES often seems more desirable to users because its 
residual norms are monotonically decreasing. On singular compatible systems, MIN- 
RES returns x* [7, Theorem 2.25]. On singular incompatible systems, MINRES re- 
mains reliable if it is terminated with a suitable stopping rule that monitors ||^4rfc|| [8, 
Lemma 3.3], but the solution is generally not x* [8, Theorem 3.2]. MINRES-QLP 
[8, 9, 11, 7] is a significant extension of MINRES, capable of computing x\ simultane- 
ously minimizing residual and solution norms. The additional cost of MINRES-QLP is 
moderate relative to MINRES: 1 vector in memory, 4 axpy (y «— ax + y), and 3 vector 
scaling (x <— ax) per iteration. The efficiency of MINRES is partially, and in some 
cases almost fully, retained in MINRES-QLP by transferring from a MINRES phase to a 
MINRES-QLP phase only when an estimated n(A) exceeds a user-specified value. The 
MINRES phase is optional, consisting of only MINRES iterations for nonsingular and 
well-conditioned subproblems. The MINRES-QLP phase handles less well-conditioned 
and possibly numerically singular subproblems. In all iterations, MINRES-QLP uses 
QR factors of the tridiagonal matrix from a Lanczos process and then applies a second 
QR decomposition on the conjugate transpose of the upper-triangular factor to obtain 
and reveal the rank of a lower-tridiagonal form. On nonsingular systems, MINRES- 
QLP enhances the accuracy (with less rounding errors) and stability of MINRES. It is 
applicable to symmetric and Hermitian problems with no traditional restrictions such 
as nonsingularity and dcnnitcncss of A or compatibility of b. 

The aforementioned established Hermitian methods are not directly applicable 
to complex or skew symmetric equations. For consistent complex symmetric prob- 
lems, which could arise in Helmholtz equations, linear systems that involve Hankel 
matrices, or applications in quantum dynamics, electromagnetics, and power systems, 
we may apply a non-Hermitian Krylov subspace method disregarding the symmetry 
of A, or a Hermitian Krylov solver (such as CG, SYMMLQ, MINRES, or MINRES- 
QLP) on the equivalent normal equation or an augmented system twice the original 
dimension. They suffer increasing memory, conditioning, or computational costs. An 
exception 2 is a special version of QMR by Freund (1992) [18], which takes advantage 
of the matrix symmetry by using an unsymmetric Lanczos framework. Unfortunately, 
it is known that the algorithm may be affected by non-benign breakdowns unless a 
look-ahead strategy is implemented. Another less than elegant feature of QMR is the 
vector norm of choice is induced by the inner product x T y but it is not a proper vector 
norm (e.g., ^ x T := [it], where i = >/— 1, yet x T x = 0). Besides, QMR is designed 
for only nonsingular and consistent problems. Inconsistent complex symmetric prob- 
lems (1.1) could arise from shifted problems in inverse or Rayleigh quotient iterations; 
mathematically or numerically singular or inconsistent systems, in which A or b are 
vulnerable to errors due to measurement, discretization, truncation, or round-off. In 
fact, QMR and most non-Hermitian Krylov solvers (other than LSQR) fail to converge 
to x^ on an example as simple as A = i diag([J]) and b = i[\], for which x^ = [J]. 

Here we extend the symmetric and Hermitian algorithms MINRES and MINRES- 
QLP to complex symmetric systems. The main aim is to deal reliably with compatible 
or incompatible systems and to return the unique solution of (1.1). Like QMR and 
the Hermitian Krylov solvers, it exploits the matrix symmetry. 

Noting the similarities in the definitions of skew symmetric matrices (A — —A T £ 
R nx ™ ) and complex symmetric matrices and motivated by algebraic Riccati equa- 



2 It is noteworthy that among direct methods for large sparse systems, MA57 and ME57 [14] are 
available for real and complex symmetric problems. 
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tions [32] and more recent, novel applications of Hodge theory in data-mining [33, 20], 
we evolve MINRES-QLP furthermore for solving skew symmetric linear systems. Greif 
and Varah [22] adapted CG for nonsingular skew symmetric linear systems that are 
skew-A conjugate, meaning — A 2 is symmetric positive definite. The algorithm is 
further restricted to A of even-order since a skew symmetric matrix of odd order is 
singular. Our MINRES-QLP extension has no such limitations and are applicable to 
singular problems. For skew Hcrmitian problems with skew Hcrmitian matrices or 
operators (A = —A* G C nx "), our approach is to transform them into Hermitian sys- 
tems so that they could immediately take advantage of the original Hermitian version 
of MINRES-QLP. 

1.1. Notation. For an incompatible system, Ax sa b is shorthand for the LS 
problem (1.1). We use "~" to mean "approximately equal to". The letters i, j, 
k in subscripts or superscripts denote integer indices, otherwise they may represent 
V— T; c and s cosine and sine of some angle 9; e^ the fcth unit vector; e a vector 
of all ones; and other lower-case letters such as b, u, and x (possibly with integer 
subscripts) denote column vectors. Upper-case letters A,T k , VJ., ... denote matrices, 
and I k is the identity matrix of order k. Lower-case Greek letters denote scalars; in 
particular, e ~ 10 -16 denotes the floating-point double precision. If a quantity 5 k is 
modified one or more times, we denote its values by 5k, 5 k , .... We use diag(i;) to 
denote a diagonal matrix with elements of a vector v on the diagonal. The transpose, 
conjugate, and conjugate transpose of a matrix A is denoted as A T , A, and A* = A 
respectively. The symbol || • || denotes the 2-norm of a vector (||x|| = \/x*x) or a 
matrix (||A|| = a x from A's SVD). 

1.2. Overview. In Section 2 we briefly review the Lanczos processes and QLP 
decomposition before developing the algorithms in Sections 3-5. Preconditioned algo- 
rithms are described in Section 6. Numerical experiments are described in Section 7. 
We conclude with future work and related software in the last section, Section 8. Our 
pseudocode and a summary of norm estimates and stopping conditions are given in 
Appendices A and B. 

2. Review. In the following few subsections, we summarize algebraic methods 
necessary for our algorithmic development. 

2.1. Lanczos processes. Given a complex symmetric operator A and a vector 
b, a Lanczos- like 3 process [2, 23], which we name as the Saunders process, computes 
vectors Vk and tridiagonal matrices T& according to Vq = 0, j3\Vi = b, and then 4 

p k = Av k , a k = v* k p k , j3 k +iv k+ i = p k - a k v k - /3 fc w fc _i (2.1) 

for k = 1, 2, ... ,£, where we choose f3 k > to give ||ufc|| = 1. In matrix form, 

'Oil fa 
fa OL2 



AV k = V k+1 T k , T k = 



■■ & 

fa Q k 
fa-+l 



T k 



V k =[v\ ■■■ v k ]. (2.2) 



3 We distinguish our process from the complex symmetric Lanczos process [36] as used in 
QMR [18], 

Numerically, p k = Av k - fa.v k _ 1 , a k = vtp k , fa. +1 v k+1 = p k - a k v k is slightly better [37]. 
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In exact arithmetic, the columns of V k are orthogonal and the process stops with 
k — £ and /3^ +1 = for some £ < n, and then AVi = ViTg. For derivation purposes we 
assume that this happens, though in practice it is rare unless V k is reorthogonalized 
for each k. In any case, (2.2) holds to machine precision and the computed vectors 
satisfy ||T4||i ~ 1 (even if k ^> n). 

If instead we are given a skew symmetric A, the following is a Lanczos process [22, 
Algorithm l] 5 that transforms A to a series of expanding, skew symmetric tridiagonal 
matrices T k and generates a set of orthogonal vectors in V k in exact arithmetic: 



Pk = Av k , 



Its associated matrix form is 



AV k =V k+1 T k , T k = 



-flk+lVk+l = Pk ~ PkVk- 



fo 

-fta 



(2.3) 



-Pk 



Pk 


-Pk+l. 



T k 
-h+ie k 



(2.4) 



If the skew symmetric process were to be forced on a skew Hermitian matrix, the 
resultant V k would not be orthogonal. Instead, we multiply Ax ps b by i on both sides 
to yield a Hermitian problem since (iA)* — iA* — iA. This simple transformation 
by a scalar multiplication 6 preserves the condition for k(A) — n(iA) and allows us to 
adapt the original Hermitian Lanczos process with vq = 0, f5\V\ = ib, followed by 



p k = iAv k , a k = v* k p k , f3 k+ iv k+1 =p k - a k v k - /3 k v k . 



(2.5) 



Its matrix form is the same as (2.2) except that the first equation is iAV k = V k +iT k . 

2.2. Properties of the Lanczos processes. The following properties of the 
Lanczos processes are notable: 

1. It A and b real, then the Saunders process (2.1) reduces to the symmetric 
Lanczos process. 

2. The complex and skew symmetric properties of A carry over to T k by the 
Lanczos processes (2.1) and (2.3) respectively. From the skew Hermitian 
process (2.5), T k is symmetric. 

3. The skew-symmetric Lanczos process (2.3) is only two-term recurrent. 

4. In (2.5), there are two ways to form p k : p k — (iA)v k or p k = A(iv k ). One may 
be cheaper than the other. If A is dense, iA takes 0(n 2 ) scalar multiplications 
and storage. If A is sparse or structured as in the case of Toeplitz, iA just 
takes 0(n) multiplications. In contrast, iv k takes n£ multiplications, where £ 
is theoretically bounded by the number of distinct nonzero eigenvalues of A, 
but in practice £ could be an integer multiple of n. 

5. While the skew Hermitian Lanczos process (2.5) is applicable to a skew sym- 
metric problem, it involves complex arithmetic and is thus computationally 
more costly than the skew symmetric Lanczos process with a real b. 

6. If A is changed to A — al for some scalar shift a, T k becomes T k — a I and 
V k is unaltered, showing that singular systems are commonplace. Shifted 



5 Anothcr Lanczos process for skew symmetric A uses a different measure to normalize /3 k +i was 
developed in [53, 50]. 

6 Multiplying by — t works equally well but without loss of generality, we use i. 



6 S.-C. T. CHOI 

problems appear in inverse iteration or Rayleigh quotient iteration. The 
Lanczos frameworks easily and efficiently handles shifted problems. 

7. Shifted skew symmetric matrices are not skew symmetric. This notion also 
applies to the case of shifted skew Hermitian matrices. Nevertheless they 
arise often in Toeplitz problems [3, 4]. 

8. For the skew Lanczos processes, the fcth Krylov subspace generated by A 
and b is defined to be JCk(A, b) = range(Vfe) = span{6, Ab, . . . , A k ~ 1 b}. For 
the Saunders process, we have a modified Krylov subspace [43] that we call 
the Saunders subspace, K-^iAA^b) © JCk 2 (AA,Ab), where k\ + k 2 = k and 
< fci - k 2 < 1. 

9. Tfc has full column rank k for all k < t since /3i, . . . , Pk+i > 0. 

Theorem 2.1. Tg is nonsingular if and only if b £ range(A). Furthermore, 
rank(T^) = I — 1 in the case b <£ range(A). 

Proof. We prove below for A complex symmetric. The proofs are similar for the 
skew symmetric and skew Hermitian cases. 

We use AV i = V{Tt twice. First, if Ti is nonsingular, we can solve Tiyp = 
/?iei and then AVeyg — VpT^y^ — Vi{5\e\ = b. Conversely, if b € range(A), then 
range(V^) C range(A) = range( J 4*). Suppose Tg is singular. Then there exists z =/= 
such that TfZ — and thus VgTgz — AVgz = 0. That is, =/= Vgz € null(A). But this 
is impossible because Vgz e range(A*) and null(A) n range(A*) = {0}. Thus Tg must 
be nonsingular. 



1/6 grange^), T t = [2>_i 
rank(T^_!) = i — 1 since rank(Tfe 



Pie i. 
a? 



is singular. It follows that £ > rank(T^) > 
k for all k < I. Therefore rank(T^) =1-1. D 



2.3. QLP decompositions for singular matrices. Here we generalize, from 
real to complex, the matrix decomposition pivoted QLP by Stewart in 1999 [49] 7 . It 
is equivalent to two consecutive QR factorizations with column interchanges, first on 
A, then on R*: 





\R S~ 




\R* 


n" 




\R 0] 


QrAUr = 





Ql 


S* 


o_ 


1U = 






(2.6) 



giving nonnegative diagonal elements, where Hr and Hl are (real) permutations cho- 
sen to maximize the next diagonal element of R and R at each stage. This gives 



A = QLP, Q = Q* R U L , 



L 



R* 




P 



QlTI t r , 



with Q and P orthonormal. Stewart demonstrated that the diagonal elements of L 
(the L-values) give better singular-value estimates than those of R (the R-values), 
and the accuracy is particularly good for the extreme singular values o~\ and o~ n : 



Ri 



0~i, 



La ~ (Ti, G\ > max La > maxRa, mmRn > imnLu > o~ n , (2.7) 



The first permutation H R in pivoted QLP is important. The main purpose of the 
second permutation II ^ is to ensure that the L-values present themselves in decreasing 
order, which is not always necessary. If fl^ = fl^ = /, it is simply called the QLP 
decomposition, which is applied to each Tk from the Lanczos processes (Section 2.1) 
in MINRES-QLP. 



7 QLP is a special case of the ULV decomposition, also by Stewart [48, 31]. 
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2.4. Householder Reflectors. Givens rotations are often used to selectively 
annihilate matrix elements. Householder reflectors [51] of the following form may be 
considered the Hermitian counterpart of Givens rotations: 



where the subscripts indicate the positions of c = cos(8) G M and s — sin(0) G C for 
some angle 9. They are orthogonal and Qf . = I as would any reflector, meaning Qi 

is its own inverse. Thus c + Is 



1. We often use the shorthand Qij 



c s 

8 —C 



In the next few sections we extend MINRES and MINRES-QLP to solving complex 
symmetric problems (1.1). Thus we tag the algorithms with "CS-". The discussion 
and results can be easily adapted to the skew symmetric and skew Hermitian cases 
and so we do not go into details. In fact, the skew Hermitian problems can be solved 
by the implementations [10, 12] of MINRES and MINRES-QLP for Hermitian problems. 
For example, we can call the MATLAB solvers by x = minresd * A, i * b) and x 
= minresqlpCi * A, i * b) achieving code reuse immediately. 

3. CS-MINRES standalone. CS-MINRES is a natural way of using the complex 
symmetric Lanczos process (2.1) to solve (1.1). For k < £, if x k = V k y k for some vector 
j/fc, the associated residual is 



r k = b - Ax k =b- AV k y k = $\V\ - V k+ iTky k = T4 + i(/3iei - Tk_y k ). 



(3.1) 



To make r k small, it is clear that $\e,\ — T k y k should be small. At this iteration k, 
CS-MINRES minimizes the residual subject to x k G range(I / / c ) by choosing 



y k = arg min \\T k y - /3iei||. 
yec k — 



(3.2) 



By Theorem 2.1, T k has full column rank and the above is a nonsingular problem. 

3.1. QR factorization of T k . We apply an expanding QR factorization to the 
subproblem (3.2) by Q = 1 and 



lk,k+l- 



Cfc Sfc 

$k -Cfc 



= Qfc,fc 



fc+1 



Q 



fc-1 



1 



Qk[T k f3ie 1 } = 



Rk tk 

<\>k 



(3.3) 



where c k and s k form the Householder reflector Q k . k +\ that annihilates (3 k +\ in T k 
to give upper-tridiagonal R k , with R k and t k being unaltered in later iterations. We 
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can express the last expression in (3.3) in terms of its elements for further analysis: 







7i 



(2) 
72 



63 

X( 2 ) 







T 2 




£fc 


) 


V 

§k 


= 




= A 


*f 








(2) 

Tk 




Tfc 









_^fc_ 





SlC 2 



Si ' ' 'Sfc-lCfe 

Si • • -s k -is k 



(3.4) 



(where the superscripts are defined in Section 1.1). With <j>i = f3i > 0, the full action 
°f Qk.k+i in (3-3), including its effect on later columns of T il k < i < £, is described by 



(3.5) 



Cfc 


Sk 




' Ik 


<Wl 





<t>k-l 




(2) 
Ik 


°fe+l 


£fc+2 


Tfc 


«A- 


~c k _ 




fik+l 


a k +i 


Pk+2 










7fe+i 


<5fc+2 


0fc 



„( 2 ) 



„( 2 ) 



Thus for each j < k < £ we have s.,-7) = /3 J+ i > 0, giving 71, 7] 7^ 0, and therefore 
each Rj is nonsingular. Also, r k = 4> k _\c k and 4> k — 4> k ^i~s k 7^ 0. Hence from (3.1)- 
(3.3), we yield the following short recurrence relation for the residual norm: 

WkW = \\TkVk - Aeill = l^fel =^ Ikfell = ||r fc _i|||s fe | = ||r fc _i|||s fe |, (3.6) 

which is monotonically decreasing and tending to zero if Ax = b is compatible. 

3.2. Solving the subproblem. When k < £, a solution of (3.2) satisfies R k y k = 
t k . Instead of solving for y k , CS-MINRES solves R k D^ = V£ by forward substitution, 
obtaining the last column d k of D k at iteration k. This basis generation process can 
be summarized as 

d\ = Vi/71, ^2 = («a - <Mi)/7 2 j 
<4 = (v k - S\'dk-\ - tkdk-2)h k ■ 
At the same time, CS-MINRES updates x k via xq = and 



(3.7) 



£/c = Vfe2/ fc = D k R k y k = D k t k = x k -\ + r k d k 



(3.8) 



3.3. Termination. When k = £, we can form Tg but nothing else expands. In 
place of (3.1) and (3.3) we have rg — Vi(fi\e\—Tnyi) and Qg-i [Tg /9iei] = [ifo if] . 
It is natural to solve for yg in the subproblem 



min \\Tiyi- fiieiW 
viec* 



min \\Rtyt 



tt 



(3.9) 



There are two cases to consider: 

1. If Tg is nonsingular, Rgyg = tg has a unique solution. Since AViyi — VgTgyg — 
6, the problem Ax = 6 is compatible and solved by xg = Vgyg with residual 



rg = 0. Theorem 3.1 below proves that xg 



= *t 



assuring us that CS-MINRES 



is a useful solver for compatible linear systems even if A is singular. 
2. If Tg is singular, A and Rg are singular (Ru — 0) and both Ax — b and Rgyg = 
tg are incompatible. The optimal residual vector is unique, but infinitely many 
solutions give that residual. CS-MINRES sets the last element of yg to be zero. 
The final point and residual stay as Xg—\ and rg—\ with ||r^_i|| = |<fo-i| = 
/3i|si| ■ ■ • |s^_i| > 0. Theorem 3.2 below proves that xi-\ is a LS solution of 
Ax f=a b (but not necessarily x'). 



Complex and Skew Symmetric Minimal Residual Methods 



Theorem 3.1. Ifbe range (A), the final CS-MINRES point xg — x* and rg = 0. 

Proof. If b e range(A), the Lanczos process gives AVt = VgT^ with nonsingular 
Ti, and CS-MINRES terminates with Axg = b and xt — Vtyt — A*q — Aq, where 
q = VgT^ yi. If some other point x satisfies Ax = b, let p = x — X£. We have Ap = 
and x\p = q*Ap = 0. Hence ||z|| 2 = \\x t + p\\ 2 = ||ir £ || 2 + 2x*p + \\p\\ 2 > \\x e \\ 2 . 
By (3.6), r t = 0. D 

Theorem 3.2. Ifbg range(A), then ||^_i|| = and the CS-MINRES xt-\ is 
an LS solution. 

Proof. Since b ^ range(A), Ti is singular and Ru = 7£ = 0. By Lemma B.2, 
A*(Axg-i — b) = —Ar^-i = — \\ri-i\\"/(Ve — 0. Thus Xe-i is an LS solution. D 

4. CS-MINRES-QLP standalone. In this section we develop CS-MINRES-QLP 
for solving ill-conditioned or singular symmetric systems. The Lanczos framework 
is the same as in CS-MINRES and QR factorization is applied to Tk in subprob- 
lem (3.2) for all k < £; see Section 3.1. By Theorem 2.1 and Property 9 in Section 2.2, 
rank(Tfe) = k for all k < £ and rank(I» >£-l. CS-MINRES-QLP handles T t in (3.9) 
with extra care to constructively reveal rank(T^) via a QLP decomposition, so it can 
compute the minimum- length solution of the following subproblcm instead of (3.9): 

ftei||. (4.1) 



\\yih s -t- 



ye. € arg min \\T e y e 



Thus CS-MINRES-QLP also applies the QLP decomposition on J\ in (3.2) for all k < £. 

4.1. QLP factorization of Tfe. In CS-MINRES-QLP, the QR factorization (3.3) 
of Tfc is followed by an LQ factorization of Rk : 



QkTk — 







RkPk — Li 



so that QkTkPk = 



Lk 




(4.2) 



where Qk and Pk are orthogonal, Rk is upper tridiagonal, and Lk is lower tridiagonal. 
When k < £, both Rk and Lk are nonsingular. The QLP decomposition of each Tk 
are performed without permutations, and the left and right reflectors interleaved [49] 
in order to ensure inexpensive updating of the factors as k increases. The desired 
rank-revealing properties (2.7) are retained in the last iteration when k — £. 

We elaborate on interleaved QLP here. As in CS-MINRES, Qk in (4.2) is a product 
of Householder reflectors, see (3.3) and (3.5), while Pk involves a product of pairs of 
Householder reflectors: 



?fe,fe+i ••• Qi,4 Q2,3 Ql,2, 



Pk = Pi.2 Pi, 3 P: 



3-^2,3 



Pk-2,kPk-l,k- 



For CS-MINRES-QLP to be efficient, in the A:th iteration (k > 3) the application of 
the left reflector Qk,k+i is followed immediately by the right reflectors Pk-2,k, Pk-i,k, 
so that only the last 3x3 bottom right submatrix of Tk is changed. These ideas 
can be understood more easily from the following compact form, which represents the 
actions of right reflectors on Rk obtained from (3.5): 



r (5) 

Tk-2 




£fc 




tffc-i 


(4) 

7i-i 


4 2) 

(2) 




_ 




Tk . 




r„< 6 ) 




- 




7fe-2 








0(2) 

u k-l 


(4) 

7fe_i 


4 3) 

(3) 




Vk 




7 fc . 





Cfc2 



Sk2 



Cfe3 

Sfc3 



Sfe2 



"Cfc2. 



Sk3 
"Cfc3j 



Ck3 

Sk3 - 

" (6) 
Tk-2 
(2) 



i) 



fc-1 

Vk 



S/s3 
"Cfc3. 



(5) 

7k_i 



(4.3) 



7 fc 



(4) 



10 



S.-C. T. CHOI 



4.2. Solving the subproblem. With yk = PkUk, subproblem (3.2) after QLP 
factorization of Tj, becomes 



Uk 





Lk 




tk 


arg min 




u — 




ueC k 







<j>k\ 



(4.4) 



where tk and 4>k are as in (3.3). At the start of iteration k, the first k— 3 elements of 
Uk, denoted by jXj for j < fc— 3, are known from previous iterations. We need to solve 
only the last three components of Uk from the bottom three equations of LkUk = tk'- 



(6) 
Tk-2 

u k-l 


(5) 

7k- 1 






r (3) i 

Mfc-2 

(2) 
Mfc-1 


= 


r T (2) 

T (2) 
7 fc-l 


Vk 


$k 


( 4 ) 

Tk . 




. ^ fc 




Tfc 



Tfe-2 - Vk-2f-k-4 - ^fe-2Mfe-3 



Tfe-1 



(3) 
Tfc 



(4.5) 



When k < £, Tk has full column rank, and so do L& and the above 3x3 triangular 
matrix. CS-MINRES-QLP obtains the same solution as CS-MINRES, but by a different 
process (and with different rounding errors). The CS-MINRES-QLP estimate of x is 
Xk = VkVk = VkPkUk = WkUk, with theoretically orthonormal Wk = VkPk, where 



W k = \V k -lPk-l v k \P k -2.kPk-l, 



W, 



(4) 
fc-3 



wi% 



(3) 
'4-2 

(4) 
4-2 



(2) 
4-1 

(3) 
4-1 



'-'A- 



P 



,(2) 



(4.6) 



fcft 



k-2,k-Tk-l,k 



Lastly, we update Xk-2 and compute Xk by short-recurrence orthogonal steps (using 
only the last three columns of Wk)'- 



(2) 
J k-2 



Xk 



(2) 
J k-3 



(4) „(3) 



"4-2/4-2; where a; 

(2) i (3) (2) . (2) 
4-2+ W fc-lMfc_l + Ofe- 



(2) 
fc-3 



^fc-3 u fc-3' 



(4.7) 
(4.8) 



4.3. Termination. When k = I and yt — PgUi, the final subproblem (4.1) 
becomes 



minlluJU s.t. ug € arg min \LgUg — i^||. 



(4.9) 



Q^+i is neither formed nor applied (see (3.3) and (3.5)), and the QR factorization 
stops. To obtain the minimum-length solution, we still need to apply Pt-v,iPt.-\,i. on 
the right of Ri and V g in (4.3) and (4.6) respectively. If b 6 range(-A), then Lg is 
nonsingular and the process in the previous subsection applies. If b ^ range(^4), the 
last row and column of Lg are zero, i.e., Lg — [ Ll ^ 1 ] (see (4.2)), and we need to 
define ug = [ Ut (j 1 ] and solve only the last two equations of L^_iU^_i = tg_-\: 



(6) 
Tt-2 

(2) 
1 



t) 







~„( 3 ) " 




~J?) ' 






^1-2 




T £-2 


-J 5 ) 




„( 2 ) 




n-W 


7«-i. 




M*-l_ 




. T t-1. 



(4.10) 



The recurrence relation (4.8) simplifies to Xg — x\_ 2 + Wf_ x n\_ v 
theorem proves that CS-MINRES-QLP yields x' in this last iteration. 
Theorem 4.1. In CS-MINRES-QLP, xg = x^ . 



The following 
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Proof. When b £ range (.A), the proof is the same as that for Theorem 3.1. 

When b $ range(A), for all u = [uj_ 1 ^g] T £ C e that solves (4.4), CS-MINRES- 
QLP returns the minimum-length LS solution ug — [uj_ 1 0} T by the construction 
in (4.10). For any x £ range(W^) = range(V^) C range(A) = range(A*) by (4.6) and 
AV e = V e T e , 

\\Ax-b\\ = \\AW t u-b\\ = \\AVgPgu-b\\ = \\ViT t Pm-p x Vte x \\ = \\T t P t u- freiH 



Qt-iTtP£U- 



tl-l 




Li-i 


0" 


u — 


tl-l 


. §1 . 







0_ 




. <Pe \ 



Since Lg-i is nonsingular, \<\>g\ — min||^4x — 6|| can be achieved by xg = Wgug = 
Wt-iUt-i and ||x£|| = ||W£_iu^_i|| = ||it^_i||. Thus xg is the minimum- length LS 
solution of \\Ax — b\\, i.e., xg — argmin{||x|| | A* Ax = A*b, x £ range(y4*)}. Likewise 
yg = Pgug is the minimum-length LS solution of \\Tgy — /3iei|| and so yg £ range(T^), 
i.e. yg = Tfz = Tgz for some z. Thus xg = Vgyg — VgTgz — AVgz = A*Vgz £ 
range(A*). We know that x' = arg min{ 1 1 x \ \ A* Ax = A*b, x £ C"} is unique and 



rt 



£ range(A*). Since xg £ range(A*), we must have xg 



= zt 



D 



5. Transferring CS-MINRES to CS-MINRES-QLP. CS-MINRES and CS- 
MINRES-QLP behave very similarly on well-conditioned systems. However, compared 
to CS-MINRES, CS-MINRES-QLP requires one more vector of storage, and each iter- 
ation needs 4 more axpy (y <— ax + y) and 3 more vector scaling (x ■£- ax) . It would 
be a desirable feature to invoke CS-MINRES-QLP from CS-MINRES only if A is ill- 
conditioned or singular. The key idea is to transfer CS-MINRES to CS-MINRES-QLP 
at an iteration k < £ when T^ has full column rank and is still well-conditioned. At 
such an iteration, the CS-MINRES point xf and CS-MINRES-QLP point x k are the 



same, so from (3.8), (4.8), and (4.4): xf = x k 
(4.2), and (4.6), 



D k t 



k'k 



= w k L-H k . 



From (3.7), 



D k L k = (V k R^){R k P k ) = V k P k - W k . 



(5.1) 



The vertical arrow in Figure 5.1 represents this process. In particular, we can transfer 
only the last three CS-MINRES basis vectors in D k to the last three CS-MINRES-QLP 
basis vectors in W k : 



[w fc _2 w k -i w k ] = [d k -2 dfc-i dk] 



(6) 
7fc-2 



fc-1 

Vk 



(5) 

Tk-i 

$ k 



(4) 

Tk . 



(5.2) 



(2) 

Furthermore, we need to generate the CS-MINRES-QLP point X k _ 3 in (4.7) from 
the CS-MINRES point xf_ x by rearranging (4.8): 



(2) 
c fe-3 



.M 



(3) (2) 
, k-2f 1 k-2 



(2) 



(5.3) 



,.(2) 



Then the CS-MINRES-QLP points x)fj_ 2 and x k can be computed using (4.7) and (4.8). 
It is clear from (5.1) and (5.2) that we still need to do the right transformation 
RkPk = L k in the CS-MINRES phase and keep the last 3x3 bottom right submatrix 
of L k for each k so that we are ready to transfer to CS-MINRES-QLP when necessary. 
We then obtain a short recurrence for ||xfc|| (see Section B.5) and for this computation 
we save flops relative to the standalone CS-MINRES algorithm, which computes \\x k \\ 
directly in the NRBE condition associated with ||rfc|| in Table B.l. 
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Fig. 5.1. Changes of basis vectors within and between the two phases CS-MINRES and CS- 
MINRES-QLP; see equations (3.7), (4.6), and (5.2) for details. 



CS-MINRES 
phase 




Vj 



CS-MINRES-QLP 
phase 



W k 



In the implementation of CS-MINRES-QLP, the iterates transfer from CS-MINRES 
to CS-MINRES-QLP when an estimate of the condition number of Tk (see (B.4)) 
exceeds an input parameter trancond. Thus, trancond > 1/e leads to CS-MINRES 
iterates throughout (that is, CS-MINRES standalone), while trancond — 1 generates 
CS-MINRES-QLP iterates from the start (that is, CS-MINRES-QLP standalone). 

6. Preconditioned CS-MINRES and CS-MINRES-QLP. Well-constructed 
two-sided preconditioners can preserve problem symmetry and substantially reduce 
the number of iterations for nonsingular problems. For singular compatible problems, 
we can still solve the problems faster, but generally obtain only LS solutions that are 
not of minimum length — this is not an issue due to algorithms but the way two-sided 
preconditioning is set up for singular problems. For incompatible systems (which are 
necessarily singular), preconditioning alters the "least squares" norm. To avoid this 
difficulty we could work with larger equivalent systems that are compatible (see for 
example approaches in [8, Section 7.3]), or we could apply a right-hand-side precondi- 
tioner M preferably such that AM is complex symmetric so that our algorithms are 
directly applicable. For example, if M is nonsingular (complex) symmetric and AM 
is commutative, then AMy sw b is a complex symmetric problem with y = M~ l x. 
This approach is efficient and straightforward. We devote the rest of this section to 
derive a two-sided preconditioning method. 

We use a symmetric positive definite or a nonsingular complex symmetric pre- 
conditioner M . For such an M, it is known that the Cholesky factorization exists, 
i.e., M = CC T for some lower triangular matrix C, which is real if M is real, or 
complex if M is complex. We may employ commonly used construction techniques of 
preconditioners like diagonal preconditioning and incomplete Cholesky factorization 
if the nonzero entries of A are accessible. It may seem unnatural to use a symmetric 
positive-definite preconditioner for a complex symmetric problem. However, if avail- 
able, its application may be less expensive than a complex symmetric preconditioner. 

We denote the square root of M as M~ i . It is known that a complex symmetric 
root always exists for a nonsingular complex symmetric M even though it may not 
be unique; see [28, Theorems 7.1, 7.2, and 7.3] or [30, Section 6.4]. Preconditioned 
CS-MINRES (or CS-MINRES-QLP) applies itself to the equivalent system Ax = b, 
where A = M~*AM~2, b = M~%b, and x = M~^x. Implicitly, we are solving 
an equivalent complex symmetric system C~ x AC~ T y = C~ x b, where C T x — y. In 
practice, we work with M itself (solving the linear system in (6.1)). For analysis, we 
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can assume C = M' for convenience. An effective preconditioner for CS-MINRES or 
CS-MINRES-QLP is one such that A has a more clustered eigenspectrum and become 
better conditioned, and it is inexpensive to solve linear systems that involve M . 

6.1. Preconditioned Lanczos process. Let V k denote the Lanczos vectors of 
the fcth Krylov subspace generated by A and the conjugate of b. With vo = and 
PiVi = b, for k = 1,2, ... we define 

z k = (3 k M^v k , qk = f3 k M~iv k , so that Mq k = z k . (6.1) 



Then f3 k = ||/3fcWfc| = \/q k z k: an( l the Lanczos iteration is 

p fc = Av k = M-iAM-3u fe = M-$Aqk/Pk, 

ak =v* k pk =qlAq k /Pl, 
Pk+iVk+i = M-2AM~*v k - a k v k - PkVk-i- 

Multiplying the last equation by Ms we get 

z k +i = /3 k+1 M^v k+1 = AM~H k - a k M^v k - p k M*v k -i 

1 , afc Pk 

= -^Ag fe - -5-zfc - ^ — %-i- 

Pfc Pk Pk-1 

The last expression involving consecutive zy's replaces the three-term recurrence in 
Vj's. In addition, we need to solve a linear system Mq k — z k (6.1) at each iteration. 

6.2. Preconditioned CS-MINRES. From (3.8) and (3.7) we have the following 
recurrence for the A:th column of D k = V k RZ and x k : 

dk = {pk - S k 'dk-i - e k d k - 2 )h k j > %k = %k-i + A dk- 

Multiplying the above two equations by M~ 5 on the left and defining d k = M~*d k , 
we can update the solution of our original problem by 

dk = (-o~Qk - S k 'd~k-i - ekdk-2) /% , x k = M~*x k = x k -i + rf'dk. 

6.3. Preconditioned CS-MINRES-QLP. A preconditioned CS-MINRES can 
be derived very similarly. The additional work is to apply right reflectors P k to R k , 
and the new subproblem bases are W k = V k P k , with x k — W k u k . Multiplying the 
new basis and solution estimate by M _ 5 on the left, we obtain 

W k =M-^W k = M-W k P k , 
x k = M-h k = M-5 W k u k = W k u k = x { k % + l^klx^k-i + Vkw k 2) . 

7. Numerical experiments. In this section we present computational results 
based on Matlab 7.12 implementation of CS-MINRES-QLP and SS-MINRES-QLP, 
which are made available to the public as open-source software and accords with the 
philosophy of reproducible computational research [13, 6]. The computations were 
performed in double precision on a Mac OS X machine with a 2.7GHz Intel Core i7 
and 16GB RAM. 
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7.1. Complex symmetric problems. Although the SJSU Singular Matrix 
Database [17] currently contains only one complex symmetric matrix (named dwg961a) 
and only one skew symmetric matrix (plskl919), it has a sizable set of singular sym- 
metric matrices, which can be handled by the associated Matlab toolbox SJsingu- 
lar [16]. We constructed multiple singular complex symmetric systems of the form 
H = iA, where A is symmetric and singular. It is clear that all the eigenvalues of 
H lie on the imaginary axis. For a compatible system, we simulated b = Hz, where 
Zi ~ i.i.d. U(0, 1), i.e., Zi were independent and identically distributed random vari- 
ables, whose values were drawn from the standard uniform distribution with support 
[0, 1]. For a LS problem, we generated a random b with bi ~ i.i.d. U(0, 1) and it is 
almost always true that b is not in the range of the test matrix. In CS-MINRES-QLP, 
we set the parameters maxit = An, tol = e, and trancond = 10 -7 for the stopping 
conditions in Table B.l and the transfer process from CS-MINRES (see Section 5). 

We compare the computed results of CS-MINRES-QLP and Matlab's QMR to 
solutions computed directly by the truncated SVD (TSVD) of H utilizing MATLAB's 
function pinv. For TSVD we have x t = S<r->tllHlle u i u t^i with parameter t > 0. 
Often t is set to 1, and sometimes to a moderate number such as 10 or 100; it defines a 
cut-off point relative to the largest singular value of H. For example, if most singular 
values are of order 1 and the rest are of order \\H\\e ~ 10~ 16 , we expect TSVD to 
work better when the small singular values are excluded, while SVD (with t = 0) 
could return an exploding solution. 

In Figure 7.1 we present the results of 50 consistent problems of the form Hx = b; 
it plots the relative error norm of approximate solutions computed by CS-MINRES- 
QLP and QMR with respect to TSVD solutions against a scalar multiple of k(H) = 
k(A) — it is known that an upper bound of the perturbation error of a singular linear 
system involves the condition of the corresponding matrix [45, Theorem 5.1]. The 
diagonal dotted red line represents the best results we could expect from any numerical 
method with double precision. We can see that both CS-MINRES-QLP and QMR did 
very well on all problems except for one in each case. CS-MINRES-QLP performed 
slightly better as a few additional problems in QMR attained relative errors of less 
than 10~ 5 . 

Our second test set involve complex symmetric matrices that have more wide- 
spread eigenspectrum than those in the first test set. Let A = l/AV T be an eigenvalue 
decomposition of symmetric A with |Ai| > ••• > |A„|. For i = 1, . . . ,n, we define 
di = (2ui — 1)| Ai| , where ui ~ i.i.d. U(0,1) if A.; ^ 0, or d. L = otherwise. Then 
the complex symmetric matrix M = VDV T + iA has the same (numerical) rank 
as A and its eigenspectrum is bounded by a ball of radius approximately equal to 
| Ai| on the complex plane. In Figure 7.2 we summarize the results of solving 50 
such complex symmetric linear systems. It is clear that CS-MINRES-QLP behaved as 
stably as it did with the first test set. However, QMR is obviously more sensitive to the 
nonlinear spectrum: two problems did not converge and about ten additional problems 
converged to their corresponding x^ with no more than four digits of accuracy. 

Our third test set consists of linear LS problems (1.1), in which A — H in the 
upper plot of Figure 7.3 and A = M in the lower plot. In the case of H, CS-MINRES- 
QLP did not converge for two instances but agreed with the TSVD solutions in five or 
more digits of accuracy for almost all other instances. In the case of M, CS-MINRES- 
QLP did not converge for five instances but agreed with the TSVD solutions in about 
five or more digits of accuracy for almost all other instances. Thus the algorithm is to 
some extent more sensitive to a nonlinear eigenspectrum in LS problems — this is also 
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Fig. 7.1. 50 consistent singular complex symmetric systems. This figure is reproducible by 
C13Fig7_l.m. 



expected by the perturbation result that an upper bound of the relative error norm 
in a LS problem involves the square of k(A) [45, Theorem 5.2]. We did not run QMR 
on these test cases since the algorithm was not designed for LS problems. 

7.2. Skew symmetric problems. Our fourth test collection consists of 50 skew 
symmetric linear systems and 50 singular skew symmetric LS problems (1.1). The 
matrices are constructed by S — tril(A) — tril(A) T , where tril extracts the lower 
triangular part of a matrix. In both cases — linear systems in the upper subplot of 
Figure 7.4 and LS problems in the lower subplot — SS-MINRES-QLP did not converge 
for six instances but agreed with the TSVD solutions for more than ten digits of 
accuracy for almost all other instances. 

7.3. Skew Hermitian problems. We also have created a test collection that 
is consituted of 50 skew Hermitian linear systems and 50 skew Hermitian LS prob- 
lems (1.1). Each of the skew Hermitian matrix is constructed by T = S + iB, where 
S is skew symmetric as defined in the last test set, and B = A — diag([an, . . . , a ran ]), 
i.e., B is A with the diagonal elements set to zero and is thus symmetric. We solve 
the problems using the original MINRES-QLP for Hermitian problems by the trans- 
formation (iT)x sa ib. In the case of linear systems in the upper subplot of Figure 7.5, 
SH-MINRES-QLP did not converge for six and five instances respectively. For the 
other instances SH-MINRES-QLP computed approximate solutions that matched the 
TSVD solutions for more than ten digits of accuracy for almost all other instances. 
As for LS problems in the lower subplot of Figure 7.5, only five instances did not 
converge. 
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Fig. 7.2. 50 consistent singular complex symmetric systems. This figure is reproducible by 
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Fig. 7.3. 100 inconsistent singular complex symmetric systems. This figure is reproducible by 
C13Fig7_3.m. 
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Fig. 7.4. 100 singular skew symmetric systems. Upper: 50 compatible linear systems. Lower: 
50 LS problems. This figure is reproducible by C13Fig7Jf.ni. 



8. Conclusions. CS-MINRES constructs its feth solution estimate from the short 
recursion Xk = Dktk = Xk-i+T~kdk (3.8), where n separate triangular systems Rj^D T — 
V k * are solved to obtain the n elements of each direction di,...,dfc. (Only dk is 
obtained during iteration k, but it has n elements.) In contrast, CS-MINRES-QLP 
constructs Xk using orthogonal steps: Xk — WkUk = x k _ 2 + w k _ 1 ^ k _ 1 + w k Hk\ see 
(4.7)-(4.8). Only one triangular system L k Uk = ffe (4.4) is involved for each k. Thus 
CS-MINRES-QLP is numerically more stable than CS-MINRES. In the mega form, the 
additional work and storage are moderate, and efficiency is retained by transferring 
from CS-MINRES to CS-MINRES-QLP only when the estimated condition number of 
A exceeds an input parameter value. 

TSVD is known to use rank-fc approximations to A to find approximate solutions 
to min \\Ax — b\\ that serve as a form of regularization. It is fair to conclude from the 
results that like other Krylov methods CS-MINRES have built-in regularization fea- 
tures [26, 25, 35]. Since CS-MINRES-QLP monitors more carefully and constructively 
the rank of Tk, which could be k or k— 1, we may say that regularization is a stronger 
feature in CS-MINRES-QLP, as we have shown in our numerical examples. 

Like CS-MINRES and CS-MINRES-QLP, SS-MINRES and SS-MINRES-QLP are 
readily applicable to skew symmetric linear systems. We summarize and compare 
these methods in Appendix C. 

8.1. Software and reproducible research. Matlab 7.12 and Fortran 90/95 
implementations of MINRES and MINRES-QLP for symmetric, Hermitian, skew sym- 
metric, skew Hermitian, and complex symmetric linear systems with short-recurrence 
solution and norm estimates as well as efficient stopping conditions are available from 
the MINRES-QLP project website [10]. 

Following the philosophy of reproducible computational research as advocated 
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Fig. 7.5. 100 singular skew Hermitian systems. Upper: 50 compatible linear systems. Lower: 
50 LS problems. This figure is reproducible by C13Fig7_5.m. 



in [13, 6], for each figure and example in this paper we mention either the source or 
the specific Matlab command. Our Matlab scripts are available at [10]. 
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Appendix A. Pseudocode of algorithms. 

Algorithm 1: CS-Lanczos. 
input: A, b, a, maxit 

1 vo = 0, Vi =b, /?i = \\b\\, fc = 

2 while k < maxit do 
3 
4 
5 
6 
7 
8 
9 



if pk+l > then v k+1 <- v k+1 //3 k+1 else STOP 
k «- k + 1 
p k — Av k — av k 
a k = v' k p k 
p k <— p k — a> k v k 
v k+ i = p k — f3 k v k -i 
/3 fc+ i = ||Vfc+i|| 
output: Ve,T( 



Algorithm 2: SS-Lanczos. 



input: A, b, maxit 

i»o=0, vi = b, 00=0, /3i = ||&||, fc = 

2 while fc < maxit do 

3 if p k +i > then ^ fc+ i «- u fc+ i/,S fc+ i else STOP 

4 fc <— k + 1 

5 Pfc = Av k — v k 

6 Vk+1 — Pk — fikVk-1 

7 |_ /3 k +l = \\Vk+l\\ 
output: Ve,Ti 



Algorithm 3: SH-Lanczos. 



input: A,b, maxit 
l vo = 0, Vi =b, Pi = H&ll, fc = 



2 while fe < maxit do 
3 

4 
5 
6 
7 

8 



if p k +l > then ?; fe+ i «- Vk+i/fa+i else STOP 

fc «- fc + 1 

Pfc = !j4d fe — i«fc 
Pfe <— Pfc — OfcUfc 

"fc+i = Pk— PkVk-l 
fik+l — ||Vfc+l|| 
output: Ve,Ti 



Appendix B. Stopping conditions and norm estimates. 

This section derives several short-recurrence norm estimates in CS-MINRES and 
SH-MINRES. As before, we assume exact arithmetic throughout, so that Vk and Qk are 
orthonormal. Table B.l summarizes how these norm estimates are used to formulate 
six groups of concerted stopping conditions. The second NRBE test is specifically 
designed for LS problems, which have the properties r^O but A*r = 0; it is inspired 
by Stewart [47] and stops the algorithm when ||A*r/s|| is small relative to its upper 
bound P|||H|. 
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Algorithm 4: Algorithm SymOrtho. 



input: a, b 

1 if |fe| = then c = 1, s = 0, r = a 

2 else if |a| = then 

3 C = 0, S = 1, r = 6 

4 else if |6j > |a| then 

5 t — \a\ / \b\, c = 1/vl +T 2 , s = ar(b/a), c <— cr, r = b/s 

6 else if |a| > |6| then 

7 r = |6| / |a|, c = 1/Vl + t 2 , s — c (b/a), r — a/c 
output: c, s, r 



B.l. Residual and residual norm. First we derive recurrence relations for r^ 
and its norm ||r fc || = |<fel- The results are true for CS-MINRES and CS-MINRES-QLP. 

Lemma B.l. Without loss of generality, let xq = 0. The following results follow. 

1. r a = b and \\r \\ = 4>o = fix- 

2. Fork = l,...,£-l, then\\r k \\ = \<j> h \ = \(f>k-i\\s k \ > \<f> h -i\ > 0. Thus \\r k \\ 
is monotonically decreasing. 

3. At the last iteration £, 

(a) If T'd,nk(Li) = £, then \\rt\\ — <f>i = 0. 

(b) Ifra,nk(L e )=£-l, then \\r e \\ = |^_r| > 0. 

Proof. 

1. Obvious. 

2. If k < £, From (3.1)-(3.8) with R k yk = £fc we have 



^fe = Vk+iQt 



t k 



Rk 





2/fc = 0feT4+lQfe e fe+l- ( B -l) 



We have ||r fc || = \tf> k \ = \<f> k -i\\s k \ > 0; see (3.4)-(3.6). 
3. If Ti is nonsingular, rg = 0. Otherwise Qi-i^i has made the last row of Rp 
zero, so the last row and column of Lp are zero; see (4.10). Thus rp = r^_i 7^ 0. 



D 



B.2. Norm of A*rk- For incompatible systems, ru will never be zero. However, 
all LS solutions satisfy A* Ax — A*b, so that A*r — 0. We therefore need a stopping 
condition based on the size of ||j4*rfe|| = ipk- We present efficient recurrence relations 
for ||A*7"fc|| in the following Lemma. We also show that A*rk is orthogonal to JCf.(A, b). 

Lemma B.2 (A*r k and ip k = \\A*r k \\ for CS-MINRES). 

1. If k < £, then rank(L fc ) = k, Ar k = IKIKt/c+i^+i + 4+2^+2) and ip k = 
||r fc ||||[7 fe+ i 4+2]||, where S k+2 = if k = £-1. 

2. At the last iteration £, 

(a) If rank(Li) = £, then Art, = and tpp = 0. 

(b) If Tank(Lp) = t—\, then Art = Ari-i = 0, and ipi = ipt—i = 0. 
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Algorithm 5: Preconditioned CS-MINRES-QLP to solve (A - al)x w b. 
input: A, b, a, M 

1 2o = 0, z\ — b, Solve Mq\ — zi, fii — \/b T qi [Initialize] 

2 wq = w_i = 0, x-2 = X-i = a;o = 

3 Co,l=C ,2=Co,3 = -l, S ,1=S ,2= $0,3=0, 0O=/9l, To = W = X-2 = X-i = Xo =0 

4 <5i = 7-1 = 70 = v-i = Wo —1)1 — -&-i — -do =•&! = M-i = Po = 0, .4. = 0, « = 1 

5 fc = 

6 while no stopping condition is satisfied do 



7 

8 

9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
2.3 
24 
25 
20 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 



fc 4- k + 1 

Pk — Aqt — aqt, ctk = -ffilkVk [Preconditioned Lanczos] 

"k 

Zk+1 = j~ k Pk - %Zk - fazk-! 

Solve Mq k+1 - z k +i, fa+i — J<l k+ i z k+i 

if fc = 1 then p fc = \\[a k fik+i]\\ else p fc = \\[/3 k a k /3fe+i]|| 

^fe c fc-i,i^fc + Sfe_i,iQ(fc [Previous left reflection...] 

7fc = Sfc-l,l<5fc — Cfc_i,iQ!fc [on middle two entries of T k e k ■ . ■] 

£fc+i = Sfc-i,i/3fc+i [produces first two entries in Tk+iek+il 

8k+i = — Cfc_i,iPfc+i 

Cfci, Sfcli7t ■<- SymOrtho(7fc, /3fc+i) [Current left reflection] 

Cfc2,Sfc2,7fc_ 2 4— SymOrtho(7^._ 2 ,efc) [First right reflection] 

s(3) Q i(2) (3) (2) (2) 

Ofc = S k2 V k -l - C k 20 k ', T k "CklTk i W — S *:27fe 

^k-1 = c fc2$fc-l + S fe2 4 2) 



Cfc3,Sfc3,7fc_! <— SymOrtho(7^_ 1 ,^ 3 ') [Second right reflection...] 

a (3) (4) (3) rj _ . c(3), 

Vk = Sfe37fe , 7fc ' — —Chalk *- t0 zero 0ut °k J 

(2) 

r fe = c ki T k [Last element of tkl 

4>k = <t>k-i\ski\, i>k-i = 0fe-i||[7fc 5fc+l]|| [Update ||r fc ||, ||Ar fc _i||] 

if fc = 1 then 7 min = 71 else 7 min «- min{7 m in,7£_ 2 ,7fc_ij TO 1} 
^ fc )=max{^- 1 ),p fc ,7i 6 _ ) 2,7t ) 1 ,!7i 4) |} [Update ||A||] 

Wfc = ||[w fc _i r^ 2) ]||, K^_4 (fc) /7min [Update Px fc ||, »(A)] 

Wfc = -(Cfc 2 /y9fc)9fc + 5fc2Wfc_ 2 [Update w fc _ 2 , Wfc-l, Wfc] 



' J fc 4 -2 = ( s k2//3k)qk + Cfc2«)f 



2 



•f 7 ^ o j-u ( 2 ) ( 2 ) ( 3 ) ( 2 ) 1 

it fc > 2 then w k - s k 3W k _ l - c k3 w k , Wk-i = c «»fc_i + «fc3Wfc 

if fc > 2 then fi[% = (r^ - r) k -2lx { k l A ~ $k-2lu, k %)/f ( k % [Update /a fc _ 2 ] 
if fc > 1 then ^ = (r^ - ^-iM^a " ^iiA&Ar^i Update /i fc _i] 
if 7^. 4) / then fi k = (t^ - r?fc/ii_2 _ ^k^ k 2 -i)hk ) else Mfc = [Compute //*] 



;r 



(2) _ „,(2) _j_ ,,(3) „„(3) 

fc-2 



= ^fe-s + A*fc-2 w fc-2 [Update x k -2~\ 



(2) (2) (3) . (2) r _ , , 

Xk = x k -2 + Mfc-i^fe-i + t^kW k [Compute x k i 

X k % = \\\X k % 4 3 2 2 ]|| [Update ||x fc _ 2 ||] 

Xfe = II [x£- 2 ^k-i A*fc] II [Compute ||xfc||] 



37 i = it, 4> = 4>k-, ip - <t>k\\[-yk+i <5fc+2]||, x = Xk, A = A (k \ ui = ui k 

output: x,(f),ip,x,A,K,,u) 
38 



[c, s 4— Sym0rtho(a, b) is a stable form for computing r — ^Jo? + b 2 , c = 2 ; g = 
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Table B.l 
Stopping conditions in CS-MINRES and SH-MINRES. NRBE means normwise relative back- 
ward error, and tol, maxit, maxcond and maxxnorm are input parameters. All norms and k(A) are 
estimated by CS-MINRES and SH-MINRES. 



Lanczos 



NRBE 



Regularization attempts 



fik+i < n\\A\\e 
k = maxit 



||r fc ||/(||A||||a!ft|| + ||6||)<nua(toi,e) 

|Ar fc ||/(P||||r fc ||)<max(toZ,£) 



k(A) > max.(maxcond , -) 
ll^fell > maxxnorm 



CS-MINRES 



% 



(4) 



< e 



Degenerate cases 

/3i = => x^ = 



= 



rt = 



b/ai 



Erroneous input 
y*(Az)^±z*(Ay) 



Proof. Case 2 follows directly from Lemma B.l. We prove the first case here. For 
k < £, Rk is nonsingular. From (3.1)-(3.8) with R k y k — tk we have 



Ar k = (f> k V k +2Tk+iQte k+1 , by (B.l) 



Qk Tk+l = Qk [Tk+l Pk+2e-k+l\ — Qk 



T k 

Pk+iel 



Pk+i^k 

Of-k+l Pk+2 



el + iQk T k +i T = [0 7 fe+ i 4+2], 
by (3.5). We take 4+2 = if k = £ - 1, so 

Ar k = T k+1 V k+2 [0 7 fe+ i 4 +2 ] = T fe+ i (jk+iVk+i + 5 k+2 Vk+2) , 
^l^\\Ar k \\ 2 = \\r k \\ 2 {[ lk+l ] 2 + [5 k+ 2] 2 ) ■ 

Result follows. D 

Typically ||^4rfc|| is not monotonic, while clearly \\rk\\ is monotonically decreasing. 
In the singular system A = UT,U T , let U = [U± l7 2 ] , where the singular vectors 
U\ correspond to nonzero singular values. Then Pa = U-JJ* and Pjf = U^U^ are 
orthogonal projectors [51] onto the range and nullspace of A. For general linear LS 
problems, Chang et al. [5] characterize the dynamics of ||rfc|| and ||A*rfc|| in three 
phases defined in terms of the ratios among ||j"fc||, ||P,irfc|| and ||Pa t '*:II> anc ^ propose 
two new stopping criteria for iterative solvers. The expositions in [1, 34] show that 
these estimates are cheaply computable in CGLS and LSQR [39, 40]. These results 
are likely applicable to CS-MINRES. 

B.3. Matrix norms. From the Lanczos process, \\A\\ > ||F fe * +1 ^4Vfc|| = ||Tfc||. 
Define 

A {0) = 0, A {k) = max fllTjejH} = max {a^K \\Tk_e k \\ } for k > 1. (B.2) 

Then ||j4|| > ||Tfc|| > A^ k '. Clearly, A^ is monotonically increasing and is thus an 
improving estimate for ||j4|| as k increases. By the property of QLP decomposition 
in (2.7) and (4.3), we could easily extend (B.2) to include the largest diagonal of L k : 

A<® = 0, A™ = maxj^- 1 ', ||I^ efc ||, 7 fJ 2 , 7^, | 7 f |} for k > 1, (B.3) 

which uses quantities readily available from CS-MINRES and gives satisfactory, if not 
extremely accurate, estimates for the order of ||-A||. 
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B.4. Matrix condition numbers. We again apply the property of the QLP 
decomposition in (2.7) and (4.3) to estimate K(T k ), which is a lower bound for k(A): 

7min <- min{7i,7^ 2 ) }, 7 min <- taia{y mia , ^} 2 , %} lt ^W for k > 3 > 
K (0) = l, K W Emax( K M,— 1 for k> 1. (B.4) 

I 7min J 

B.5. Solution norms. For CS-MINRES-QLP, we derive a recurrence relation for 
||xfe|| whose cost is as low as computing the norm of a 3- or 4- vector. This recurrence 
relation is not applicable to CS-MINRES standalone. 

Since ||xfc|| = \\V k P k u k \\ = ||u fc ||, we can estimate ||x fc || by computing Xk = ||ufc||. 
However, the last two elements of u k change in u k +i (and a new element fik+i is 
added). We therefore maintain Xk-2 by updating it and then using it according to 

X k % = IILxL-3 All Xk = \\[xi% 4-1 M*]|| cf. (4.7) and (4.8). 

Thus Xk-2 increases monotonically but we cannot guarantee that \\x k \\ and its recurred 

estimate Xk are increasing, and indeed they are not in some examples. But the trend 

for Xk is generally increasing, and x k i s theoretically a better estimate than Xk 

for ||a;fc||. In LS problems, when 7^ is small enough in magnitude, it also means 

\\ x k\\ = \\yk\\ = \\uk\\ is large — and when this quantity is larger than maxxnorm, it 

(2) (3) (2) 

usually means that we should do only a partial update of x k = x k _ 2 + w k _ 1 n k _ 1 , 

(2) 
which if still exceeds maxxnorm in length, then we shall do no update, i.e., x k = x k _ 2 - 

B.6. Projection norms. In applications requiring nullvectors [7], Ax k is useful. 
Other times, the projection of the right-hand side b onto Kk(A,b) is required [42]. For 
the recurrence relations of Ax k and its norm, we have 



Ax k = AVkVk = V k+1 T k y k = V k+1 Q* k 



Rk 





Vk = V k+ iQ* k 



ui = \\ Axk r = \\t k r = \\ tk _,t + (rf r = <4-i + (4 2) ) 2 = UK-, rf>]n a . 

Thus {cu k } is monotonic. 

Appendix C. Comparison of Lanczos-based solvers. 

We compare our new solvers with CG, SYMMLQ, MINRES, and MINRES-QLP in 
Tables C.1-C.2 in terms of subproblem definitions, basis, solution estimates, flops, 
and memory. A careful implementation of SYMMLQ computes x k in range(Vfc+i); 
see [7, Section 2.2.2] for a proof. All solvers need storage for v kl v k+i , x k , and a 
product p k — Av k or Av k each iteration. Some additional work-vectors are needed 
for each method (e.g., d fc _i and d k for MINRES or CS-MINRES, giving 7 work- vectors 
in total). It is noteworthy that even for Hermitian and skew Hermitian problems 
Ax = b, the subproblems of CG, SYMMLQ, MINRES, and MINRES-QLP are real. 
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Table C.l 
Subproblems defining x k for CG, SYMMLQ, MINRES, MINRES-QLP, CS-MINRES, CS- 
MINRES-QLP, SH-MINRES, and SH-MINRES-QLP. 



Method 


Subproblem 


Factorization 


Estimate of x^ 


cgLanczos 
[27, 38, 44] 


TkVk = /3iei 


Cholesky: 
Tk = L k D k L k 


£fc = V k yk 
e range (V fc ) 


SYMMLQ 

[38, 7] 


y k +i = argmin^Rk+i ||y|| 
s.t. Ify = /3 iei 


LQ: 

Tk T Q T k = [L k 0] 


e range(Vfe +1 ) 


MINRES 

[38], [7]- [11] 


y k = arg min \\T k y - /3iei|| 

ySR fc 


QR: 


~Rk 





Xk = Vfc^fc 
G range (V^) 


MINRES-QLP 

[7]-[H] 


y k = argmin yeRfc \\y\\ 

s.t. y g arg min T^/ - /3iei || 


QLP: 

QkTkPk = 


Lk 





S range (Vfe) 


CS-MINRES 


y fe = arg min \\T k y - /3iei| 
2/ec fc 


QR: 

QfcTfc = 


Rk 






Xk = V k y k 
e range(T4) 


CS-MINRES-QLP 


y fe = argmin^gefc ||y|| 

s.t. ye arg min ||Ty/ - ftei || 


QLP: 

QkTkPk = 


0_ 




Xk = V k y k 
e range (T4) 


SH-MINRES 


y fe = arg min ||T fc y - /?iei|| 

y6R fc 


QR: 

QkTk = 


-Rfe 






Xk = V k y k 
e range (T4) 


SH-MINRES-QLP 


y fe = argmin yeRfc \\y\\ 

s.t. ye argmin||Tfcy-/3iei|| 


QLP: 

QkTkPk = 







Xk = V k y k 
G range (Vfe) 



Bases, 



Table C.2 
era solutions, storage, and work for each method. 



Method 


New basis 


Zk,tk,U k 


x k estimate 


vecs flops 


cgLanczos 


W k = V k L^ T 


L k D k z k = /3iei 


x k =W k z k 


5 8n 


SYMMLQ 


W k = V k+1 Ql 


'Ik 




L k z k =0iei 


x k =W k z k 


6 9n 


MINRES 


D k = V k R^ 


t k =fa[l k 0]Q kei 


x k =D k tk 


7 9n 


MINRES-QLP 


w k = V k P k 


L k u k = lii[l k 0]Q fc ei 


x k = W k u k 


8 Un 


CS-MINRES 
SH-MINRES 


D k = VkR^ 1 


t k = Pi[h Q]Q k e x 


Xk=D k t k 


7 9n 


CS-MINRES-QLP 
SH-MINRES-QLP 


w k = V k P k 


L k u k = Pi[l k OJQfcei 


x k =W k u k 


8 Un 
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